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Human coronaviruses are important human pathogens and have also been implicated in multiple sclerosis. To further 
understand the molecular biology of human coronavirus 229E (HCV-229E), molecular cloning and sequence analysis 
of the viral RNA have been initiated. Following established protocols, the 3’-terminal 1732 nucleotides of the genome 
were sequenced. A large open reading frame encodes a 389 amino acid protein of 43,366 Da, which is presumably the 
nucleocapsid protein. The predicted protein is similar in size, chemical properties, and amino acid sequence to the 
nucleocapsid proteins of other coronaviruses. This is especially evident when the sequence is compared with that of 
the antigenically related porcine transmissible gastroenteritis virus (TGEV), with which a region of 46% amino acid 
sequence homology was found. Hydropathy profiles revealed the existence of several conserved domains which could 
have functional significance. An intergenic consensus sequence precedes the 5’-end of the proposed nucleocapsid 
protein gene. The consensus sequence is present in other coronaviruses and has been proposed as the site of binding 
of the leader sequence for mRNA transcriptional start. This region was also examined by primer extension analysis of 
mRNAs, which identified a 60-nucleotide leader sequence. The 3-noncoding region of the genome contains an 11- 
nucleotide sequence, which is relatively conserved throughout the Coronavirus family and lends support to the theory 


that this region is important for the replication of negative-strand RNA. © 1989 Academic Press, inc. 


INTRODUCTION 


Human coronavirus 229E (HCV-229E) belongs to 
one of two major antigenic groups of human coronavi- 
ruses (MacNaughton, 1981). It shares antigenic rela- 
tionships with other coronaviruses, such as porcine 
transmissible gastroenteritis virus (TGEV), feline infec- 
tious peritonitis virus (FIPV), and canine coronavirus 
(CCV). The other well-characterized human coronavi- 
rus, HCV-OC43, is in a separate antigenic group which 
includes mouse hepatitis virus (MHV) and bovine coro- 
navirus (BCV). Both human coronaviruses are mainly 
respiratory pathogens and have been estimated to 
cause up to 25% of common colds (McIntosh et a/., 
1974; Wege et a/., 1982). They have also been impli- 
cated in gastrointestinal diseases (Resta et a/., 1985). 
Furthermore, the isolation of coronaviruses bearing an 
antigenic relationship to HCV-OC43 from the central 
nervous system of two patients with multiple sclerosis 
has suggested a possible etiologic relationship be- 
tween human coronaviruses and multiple sclerosis 
(Burks et a/., 1980). This possibility is supported by the 
observation that neurotropic strains of MHV cause de- 
myelination in the central nervous system of rodents 
(Weiner and Stohlman, 1978). Thus, human coronavi- 
ruses are important human pathogens. 

The structural and biochemical properties of several 
coronaviruses, particularly MHV and avian infectious 
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peritonitis virus (IBV), have been well characterized (Lai 
et al., 1987; Boursnell et a/., 1987). The virion contains 
a single-stranded, positive-sense RNA molecule (mo- 
lecular weight 6-8 x 10° Da) (Lai and Stohiman, 1978) 
associated in a helical conformation with nucleocapsid 
proteins (N). The viral nucleocapsid is enclosed by an 
envelope, in which are embedded at least two types of 
viral proteins, the peplomer (E2) and matrix (E1) glyco- 
proteins. Coronavirus RNA replication occurs in the cy- 
toplasm of infected cells and is mediated by a virus- 
encoded RNA-dependent RNA polymerase (Brayton et 
al., 1982). The virus-specific MRNA in infected cells 
comprises a genomic-sized RNA plus six subgenomic 
mRNA species. These mRNAs are arranged in a 
nested-set structure, which is characterized by RNAs 
having common 3’-termini but extending for varying 
lengths in the 5’ direction (Lai et a/., 1981). Only the 5’- 
proximal regions of each mRNA are translated (Rottier 
et a/., 1981). A unique feature of the structure of coro- 
navirus is the existence, at the 5’-end of each mRNA, of 
an identical leader sequence. This sequence is derived 
from the 5’-end of the genomic RNA and is of approxi- 
mately 70 nucleotides in length (Lai et a/., 1983, 1984). 
Recent evidence has supported a role for the leader 
sequence in mediating a novel type of discontinuous 
transcription of genomic RNA (Baric et a/., 1985; Ma- 
kino et a/., 1986; Shieh et a/., 1987). 

In contrast to other coronaviruses, the molecular bi- 
ology of human coronaviruses is relatively poorly un- 
derstood. The genomic RNA of both HCV-229E and 
HCV-0C43 has a molecular weight of approximately 6 
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xX 10° Da (Hierholzer et a/., 1981). The six subgenomic 
RNA species appear to have lower molecular weights 
than those of the corresponding MHV RNAs (Weiss 
and Leibowitz, 1981). The structure of these mRNAs is 
not yet known. Analysis of purified HCV-229E virions 
has revealed three major polypeptides: a glycosylated 
protein with a molecular weight of 180 kDa, a phos- 
phorylated nucleocapsid protein of 50 kDa, and a fam- 
ily of polypeptides with molecular weights of 25, 23, 
and 21 kDa (Kemp ef a/., 1984). In addition, several mi- 
nor nonstructural polypeptides of 107, 92, and 39 kDa 
have been identified (Kemp et a/., 1984). The functions 
of these proteins have not yet been characterized. 

To further understand the molecular biology of HCV- 
229E, we have initiated molecular cloning and se- 
quence analysis of HCV-229E RNA. In this paper we 
report the sequence analysis of the gene encoding the 
nucleocapsid protein of HCV-229E. In addition, the 
mRNA leader sequence was also identified. The re- 
sults are compared with sequences of other coronavi- 
ruses including MHV, BCV, IBV, and TGEV. 


MATERIALS AND METHODS 
Virus and cells 


HCV-229E (obtained from Dr. J. Fleming, University 
of Southern California) was propagated at low multiplic- 
ities of infection in human fetal lung cells L132 (Ken- 
nedy and Johnson-Lussenberg, 1975/1976), using 
Dulbecco's modified Eagle’s medium (DMEM) supple- 
mented with 10% fetal calf serum. 


Virus purification and preparation of virion RNA 


Following a virus adsorption period of 1 hr at 37°, 
HCV-229E-infected L132 monolayers were incubated 
at 37° for 24 to 48 hr, at which time the cell culture fluid 
was harvested. Viruses were precipitated from 2 liters 
of culture fluid with 50% ammonium sulfate and centri- 
fuged at 8000 rpm for 30 min. The pellet was resus- 
pended in NTE buffer (0.1 M NaCl, 0.01 M Tris—hydro- 
chloride (pH 7.2), 1 mM/ EDTA) and then placed on a 
discontinuous sucrose gradient consisting of 60, 50, 
30, and 20% (w/w) sucrose in NTE buffer and centri- 
fuged at 26,000 rpm for 13 hr at 4° in a Beckman 
SW28.1 rotor. The virus band at the interface between 
50 and 30% sucrose was collected and diluted three- 
fold with NTE buffer. The diluted virus suspension was 
centrifuged on a linear sucrose gradient at 26,000 rpm 
in an SW28.1 rotor for 4 hr at 4°. The virus band was 
collected and treated with proteinase K (0.2 mg/ml) for 
20 min at 37°, followed by 1% SDS for 30 min at 37°. 
Genomic RNA was extracted with phenol and then with 
phenol/chloroform, and precipitated with ethanol. 


Preparation of intracellular RNA 


Monolayers of L132 cells grown in 100 x 20-mm cul- 
ture dishes were infected with HCV-229E. Cells were 
incubated in phosphate-free DMEM containing 1% dia- 
lyzed fetal calf serum 4 hr prior to RNA extraction. Acti- 
nomycin D (1 ug/ml) (Sigma) and [§*P]orthophosphate 
(70 pCi/ml) (ICN Radiochemicals) were added at 3 and 
2 hr, respectively, prior to RNA extraction at 15 hr post- 
infection (p.i.). Cells were collected in cold phosphate- 
buffered saline and centrifuged at 5000 rom for 3 min 
at 4°. The pellet was mixed with cold 0.5% Nonidet- 
P40 in NTE buffer, incubated for 10 min at 4°, and then 
centrifuged at 5000 rpm for 3 min. The supernatant 
was transferred to a fresh tube containing 1/10 vol of 
10% SDS at room temperature and vortexed briefly. In- 
tracellular RNA was extracted with phenol and phenol/ 
chloroform and precipitated with ethanol. Poly(A)-con- 
taining RNA was selected by oligo(dT)-cellulose chro- 
matography as previously described (Makino et a/,, 
1984). 

To examine the kinetics of viral mRNA synthesis, in- 
tracellular RNA was extracted from virus-infected L132 
monolayers in 60 X 15-mm culture dishes at 7, 21, 29, 
46, and 58 hr postinfection. 


cDNA cloning 


cDNA cloning was performed using a modified 
method of Gubler and Hoffman (1983). The poly(A)- 
containing RNA extracted from 229E-infected L132 
monolayers was precipitated, dried, and resuspended 
in 6.72 ul of autoclaved water. The RNA was incubated 
with 10 mM methylmercuric hydroxide in an 8 ul total 
volume for 10 min at room temperature. First-strand 
cDNA synthesis was carried out in a 50-pl reaction mix- 
ture containing 60 units RNasin (Promega Biotec), 10 
mM MgCl., 100 mM KCI, 50 mM Tris-HCl (pH 8.3 at 
42°), 10 mMDTT, 1.25 mM dNTPs, 40 uCi [a-*?P]GATP 
(3000 Ci/mmol), 28 mM B-mercaptoethanol, and 10 ng 
oligo(dT),2.1g primer. After 5 min at room temperature, 
40 units of AMV reverse transcriptase (Life Science) 
was added and the mixture was incubated for 1 hr at 
42°. The reaction was stopped by adding 4.4 ul of 250 
mM EDTA. The products were extracted with phenol/ 
chloroform and precipitated with ethanol containing 
0.3 M ammonium acetate. For second-strand synthe- 
sis, the 100-ul reaction mixture contained 5 mM 
MgClz, 100 mM KCI, 20 mM Tris-HCl (pH 7.5), 50 pg/ 
ml bovine serum albumin (BSA), 10 mM ammonium 
sulfate, 0.15 mM B-NAD, 100 uM dNTPs, 25 units of 
Escherichia coli DNA polymerase |, 2 units of £. coli 
DNA ligase, and 0.8 units of RNase H. Sequential incu- 
bations were for 1 hr at 12° and 1 hr at 22°. The reac- 
tion was stopped by the addition of 8.7 ul of 250 mM 
EDTA and the products were extracted with phenol/ 
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chloroform and precipitated with ethanol in the pres- 
ence of 0.3 M ammonium acetate. Homopolymeric tail- 
ing of double-stranded cDNA with poly(C) was carried 
out in a 12-ul reaction mixture containing 10 units of 
terminal transferase, 200 mM potassium cacodylate, 
0.5 mM CoCl., 25 mM Tris-HCl (pH 6.9), 2 mM DTT, 
250 ug/ml BSA, and 50 uM dCTP at 37° for 4 min. The 
dC-tailed double-stranded DNA was annealed to 200 
ug of dG-tailed Pstl-cut pBR322 plasmid in 20 ul of a 
buffer containing 10 mM Tris-HCl (pH 7.4), 100 mV 
NaCl, and 0.25 mM EDTA. The mixture was incubated 
for 5 min at 68° and then cooled slowly overnight. The 
annealed molecules were used to transform E. coli 
MCI061 as described (Dagert and Erhlich, 1979). 


Colony hybridization 


Colonies grown on LB/tetracycline plates were incu- 
bated at 37° for 12 hr and transferred to Colony/Plaque 
Screen disks (New England Nuclear). Bacterial lysis 
and DNA fixation were carried out according to the 
methods previously described (Grunstein and Hog- 
ness, 1975). The disks were prehybridized in a solution 
containing 0.2% polyinylpyrrolidone (MW 40,000), 
0.2% Ficoll (MW 400,000), 0.2% BSA, 0.05 M Tris-HCl 
(pH 7.5), 1% SDS, 1M NaCl, 10% dextran sulfate, and 
100 ug/mi denatured salmon sperm DNA at 65° for 6- 
hr. Fragments derived from either the 5’- or 3’-ends of 
gene 7 were labeled with °*P by nick-translation and 
added to the solution. Hybridization was carried out for 
20 hr at 65°. The disks were then washed twice in 2X 
SSC (0.3 M NaCl, 30 mM sodium citrate) at room tem- 
perature, twice in 2X SSC containing 1% SDS for 30 
min at 65°, and twice in 0.1 SSC at room temperature 
for 30 min. The disks were air-dried and exposed to X- 
ray film at —70°. 


Northern hybridization 


intracellular RNA from virus-infected cells was dena- 
tured by glyoxal treatment and separated by electro- 
phoresis on a 1% agarose gel containing 10 mM so- 
dium phosphate (pH 7.0) as described previously (Mc- 
Master and Carmichael, 1977). RNA transfer to 
Biodyne nylon filters (ICN Radiochemicals) and subse- 
quent hybridization were performed according to the 
method described by Thomas (1980). 


Primer extension 


A synthetic oligodeoxyribonucleotide was 5’-end-la- 
beled with [y-*?PJATP by polynucleotide kinase (Peder- 
sen and Haseltine, 1980). The total amount of 
poly(A)-containing RNA extracted from 229E-infected 
cell monolayers in three 150 X 20-mm culture dishes 
was incubated in 8 ul of distilled water containing 10 
mM methylmercuric hydroxide for 10 min at room tem- 


perature. A further incubation was carried out in a 50- 
ul reaction volume containing 60 units of RNasin (Pro- 
mega), 10 MM MgCl,, 100 mM KCI, 50 mM Tris-HCl 
(pH 8.3 at 42°), 10 mM DTT, 1.25 mM dNTPs, 28 mM 
B-mercaptoethanol, 5’-end-labeled synthetic oligo- 
deoxyribonucleotides, and 20 units of AMV reverse 
transcriptase (Life Science) for 1 hr at 42°. Reaction 
products were extracted with phenol/chloroform, pre- 
Cipitated with ethanol, and then analyzed by electro- 
phoresis on a 6% polyacrylamide gel containing 8.3 M 
urea. The primer-extended product was identified by 
autoradiography and eluted from the gel according to 
the published procedure (Maxam and Gilbert, 1977). 


DNA sequencing 


Sequencing was carried out by the dideoxyribo- 
nucleotide chain termination method (Sanger et a/., 
1977) as well as the chemical modification procedure 
(Maxam and Gilbert, 1977). In the first method, frag- 
ments of cDNA inserts generated by various restriction 
endonucleases were cloned into the M13 vectors 
mp18 and mp19 (Messing and Vierira, 1982). [a-3°S]- 
dATP was used as a label. Sequence data were also 
obtained by chemical modification (Maxam and Gilbert, 
1977) of various cDNA fragments subcloned into the 
pT7-3 vector (Tabor and Richardson, 1985). In the sec- 
ond method, cDNA fragments were 3’-end-labeled with 
Klenow fragment at internal restriction sites or, alterna- 
tively, at the polylinker cloning site of pT7-3. End-la- 
beled cDNA restriction fragments were separated by 
electrophoresis on preparative polyacrylamide gels 
(Maxam and Gilbert, 1980) and purified as described 
previously (Hansen et a/., 1980; Hansen, 1981). Se- 
quencing of the primer-extended product of mRNA7 
was performed by the chemical modification proce- 
dure (Maxam and Gilbert, 1977). Sequence analysis 
was performed by the Intelligenetics and Seqaid pro- 
grams. Hydropathy profiles were constructed using the 
PepPlot program of the University of Wisconsin Com- 
puter Genetics Group, which employs both the Kyle— 
Doolittle (KD) and Goldman, Engelman, Steitz (GES) al- 
gorithms. 


RESULTS 
Kinetics of HCV-229E mRNA synthesis 


To determine the optimum time for extracting 229E- 
specific mRNAs, we first studied the kinetics of virus- 
specific MRNA synthesis. Intracellular RNA was ex- 
tracted from infected L132 monolayers at specified 
times p.i. The RNA was separated by agarose gel elec- 
trophoresis (Fig. 1). As can be seen, viral mRNA syn- 
thesis could be detected as early as 7 hr p.i. and 
reached maximum at 29 hr p.i. Thereafter, total RNA 
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Fia. 1. Kinetics of synthesis of HCV-229E-specific RNAs. Intracel- 
lular RNA from HCV-229E-infected L132 cell monolayers was la- 
beled with [°?Plorthophosphate, extracted, and separated by aga- 
rose gel electrophoresis as described under Materials and Methods. 
RNA was extracted at the following times: lane A, 7 hr p.i.; lane B, 
21 hro.i.; lane C, 29 hr p.i.; lane D, 46 hr p.i.; lane E, 58 hr p.i. The 
positions and designations of HCV-229E-specific RNAs are indi- 
cated by the numbers on the left side of the figure. 


synthesis gradually declined. By 46 hr p.i. only the most 
abundant mRNA species were evident. The number 
and size of these mRNA species are comparable to 
those of MHV mRNAs and are in agreement with pre- 
viously published results (Weiss and Leibowitz, 1981). 
Significantly, MRNA 2a, which was previously found 
only in BCV-infected cells and proposed to encode 
hemagglutinins (King et a/., 1985; Keck et a/., 1988), 
was not present. This is consistent with the finding that 
HCV-229E does not have hemagglutinating activity 
(Hierholzer, 1976). The relative amounts of the mRNA 
species were the same throughout the replication cy- 
cle. Therefore, in all of our subsequent experiments, 
the virus-specific intracellular RNAs were extracted at 
15 hr pi. 


Molecular cloning of HCV-229E genomic RNA and 
intracellular virus-specific mRNAs 


cDNA cloning was initially performed using virion ge- 
nomic RNA as a template. The sizes of inserts in the 


resultant CDNA clones ranged from 0.2 to 0.5 kb in 
length. One clone, A34, contained a 0.45-kb insert, 
which was subsequently characterized by restriction 
mapping and Northern blot analysis. The 0.45-kb frag- 
ment was labeled with °*P by nick-translation and hy- 
bridized with intracellular RNA from 229E-infected 
cells. The result, shown in Fig. 2, revealed that the frag- 
ment hybridized to each of the MRNA species. This re- 
sult suggested that the HCV-229E subgenomic 
mRNAs possess a nested-set structure similar to other 
coronaviruses (Lai, 1988) and that A34 represented a 
cDNA clone of either the 3’-end of the genomic RNA or 
the leader sequence. 

Cloning was subsequently carried out using intracel- 
lular RNA from 229E-infected cells as a template. The 


A B 


Fic. 2. Northern blot analysis of HCV-229E-specific intracellular 
RNA hybridized with clone A34. Intracellular RNA from either unin- 
fected (lane A) or HCV-229E-infected (lane B) L132 monolayers was 
denatured by glyoxal treatment, separated on a 1% agarose gel, and 
transferred to Biodyne nylon filters as described under Materials and 
Methods. The positions and designations of the HCV-229E-specific 
RNAs are indicated by the numbers on the right side of the figure. 
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Fic. 3. Diagram of the 3’-end of the HCV-229E genome, including cDNA clones and sequencing strategy. (a) Restriction map of the 3’-end 
cDNA clones with reference to the entire viral genome. (b} Relative position and size of the cDNA clones which were sequenced. Clones L8 
and L37 are shown in part. Clone A34 was used in colony hybridization studies. (c) Direction and extent of sequence data obtained from 
subcloned fragments. Arrows with solid squares indicate dideoxy sequencing method. Arrows with open circles indicate chemical modification 
sequencing method. Arrows with open diamonds indicate sequencing, by chemical modification, of fragments subcloned into the Pst|-Smal 
sites of pT7-3 and 3’-end-labeled with Klenow fragment. The arrow with a closed circle indicates a DNA fragment which was labeled at the 3'- 
end using radiolabeled cordycepin and terminal transferase. Abbreviations: B, Ba/l; E, EcoRI; H, Hindtll; P, Pstl; R, Rsal. 


resulting cDNA clones were screened by colony hy- 
bridization using the 0.45-kb fragment from clone A34 
as a nick-translated probe (Fig. 3). Several positive col- 
onies were identified and characterized further. Clone 
L8 contained a 3.6-kb insert but lacked a 3’-poly(A) tail. 
Clone L37, which contained an insert of 1.7 kb, over- 
lapped L8 but was 0.1 kb shorter at the 3’-end. This 
clone also lacked a poly(A) sequence (see below). 
Therefore, additional cDNA clones were isolated using 
a 0.24-kb Ba/ I-EcoRI| fragment of L8 (Fig. 3a) as a 
probe. These latter clones were further characterized 
by Southern blot analysis. Clone $10 contained an in- 
sert of 0.8 kb which overlapped the 3’-ends of the two 
previous clones and extended another 0.4 kb in that 


direction. Figure 3b shows the orientation and sizes of 
clones L8, L37, S10, and A34 with reference to the viral 
genome. Restriction enzyme sites used for sequencing 
are also shown. 


Sequencing of the cDNA clones 


To determine the sequence of the 3-end of HCV- 
229E genome, various restriction fragments of L8, L37, 
and $10 were subcloned into M13 vectors. For L8, only 
the 1.2-kb fragment extending from an internal Pstl site 
toward the 3’-end was sequenced. Clone L37 was also 
sequenced in part. Figure 3c shows the cDNA frag- 
ments and strategy used in sequencing. Each region 
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5 ' -CGGAAGGTCCGTAAATTCACAAAATAGCACAGGCTGGGTTTTCTACGTACGAGTAAAACACGGTGATTTTTCTGCAGTGAGCTCTC 86 
MATVKWAODA S 10 

87 CCATGAGCAACATGACAGAAAACGAAAGATTGCTTCATTTTTICTAAACTGAACGAAAAGATGGCTACAGTCAAATGGGCIGATGCATCT 176 
11 EPQRGRQGRIPYSLYSPLLVDSEQSWKVIP 40 
177 GBACCACAACGTGGTCGTCAGGGTAGAATACCTIATTCTCTTTATAGCCCTITGCTTGITGATAGTGAACAATCTTGGAAGGIGATACCT 266 
4. RNUvV PINK KODKNKOLIGyYWNVQKRERTRKGK 70 
267 CGTAATCTGGTACCCATCAACAAGAAAGACAAAAATAAGCTTATAGGCTATTGGAATGTTCAAAAACGITTCAGAACTAGAAAGGGCAAA «356 
7.1 RVODULS PKLUHFYYLGTGCPHKDAKFRERVEGY 100 
357 CGGGTGGATTTGTCACCCAAGCTGCATTTTTATTATCTTGGCACAGGACCCCATAAAGATGCAAAATTTAGAGAGCGTGITGAAGGTGTC 446 
101 vWVAVODGAKTEPTGHGARRKNS EPEIP HEN 130 
447 GICTGGGTTGCTGTTGATGGTGCTAAAACTGAACCTACAGGCCACGGCGCCAGGCGCAAGAATTCAGAACCAGAGATACCACACTTICAAT 536 
131 QKLPNGVTVVEEP DS RAPS RS S RS QS R GP 160 
537 CAAAAGCTCCCAAATGGTGTTACTGTTGTTGAAGAACCTGACTCCCGTGCTCCTTCCCGGTCTCAGICGAGGTCGCAGAGTCGCGGTCCT 626 
161GEs KP QS RNPSSDRYHNSQODODIMKAVAAAL 190 
627 GGTGAATCCAAACCTCAATCTCGGAATCCTTCAAGTGACAGATACCATAACAGTCAGGATGACATCATGAAGGCAGTTGCTGCGGCICTT 716 
191K S LGFDKePQEKDKKSAKTGMTPKPSRNOQOSPA 220 
707 AAATCTTTAGGTTTTGACAAGCCTCAGGAAAAAGATAAAAAGTCAGCGAAAACGGGTACTCCTAAGCCTICTCGTAATCAGAGTCCTGCT 806 
221 $ 8 TS AK S LARS QS S ET KEQK HE TI EK PRWK 250 
797 TCTTCTCAAACTTCTGCCAAGAGTCTTGCICGTTCTCAGAGTTCTGAAACAAAAGAACAAAAGCATGAAATCGAAAAGCCACGGTGGAAA —«- 896 
21 RQPNDDVTSNVTQOCFGPRDLDHNFGSAGVV 280 
897 AGACAGCCTAATGATGATGTGACATCTAATGTCACACAATGTTTTGGCCCCAGAGACCTTGACCACAACTTTGGAAGTGCAGGTGITGTG 986 
281 AN GV KA KG¥Y~?P FAELVPSTAAMULFODS HIV 5S 310 
987 GCCAATGGTGTTAAAGCTAAAGGCTATCCACAATTTGCTGAGCTTGTGCCGTCAACAGCTGCTATGCTGITTGATAGTCACATTGITTCC 1076 
311 KESGNTVVLUTFTTRVTVPKDAPHLUGKFULEE 340 
1077 AAAGAGTCAGGCAACACTGTGGTCTTGACTTTCACTACTAGAGTGACTGTGCCCAAAGACCATCCACACTTGGGTAAGTTICTTGAGGAG 1166 
341 LNA FTREMQQQOPLLUNPSALEFNPSQTSPAT 370 
1167 TTAAATGCATTCACTAGAGAAATGCAACAACAGCCTCTTCTTAACCCTAGTGCACTAGAATTCAACCCATCTCAAACTTCACCTGCAACT 1256 
371 A EPVRODEFSIETDIIDEVNZ 389 
1257 GCTGAACCAGTGCGTGATGAATTTTCTATTGAAACTGACATAATTGATGAAGTAAACTAAACATGCCACTGTGTTGTTIGAAATTCAGGC 1346 
1347 TTTAGTTGGAATTTTGCTTTTGCICTIGCTITTATTATCTTTCTTTAATACATTGCTTTICTCTGATCTAIGTATGATGGTACGATCAGA 1436 
1437 GCTACTTTTAATTAACATGATCCCTTGCITTGGCTTGATAAGGATCTAGTCTTATACACAATGGTAAGCCAGTGGTAGTAAAGGTATAAG 1526 
1527 AAATTTGCTACTATGTTACTGAACCTAGGTGAACGCTAGTATAACTCATTACAAATGTGCTGGAGTAATCAAAGATCGCATTGACGAGCC 1616 
1617 AACAATGGAAGAGCCAGTCATTTGTCTTGAGACCTATCTAGTTAGTAACTGCTAATGGAACGGTTTCGATATGGATACAC-POLY (A)-3' 1696 
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Fic. 4. The primary nucleotide sequence of the 3’-end of HCV-229E RNA and the deduced amino acid sequence of the nucleocapsid protein. 
A primer extension study was carried out using a synthetic oligodeoxyribonucleotide complementary to an 18-mer sequence underlined near 
the 5’-end of the gene. The 3’-noncoding region contains a conserved sequence which is shown by the double line. The intergenic conserved 


sequence, TCTAAACT, is also shown (dotted line). 


was verified by dideoxy chain termination sequencing 
of both strands or by the chemical modification 
method. Clone $10 was found to have a poly(A) stretch 
of 34 bases. Figure 4 shows the complete DNA se- 
quence with a translation of the main open reading 
frame (ORF) in one-letter amino acid code. This ORF 
extends from base 147 to base 1313 and predicts a 
389 amino acid protein with a molecular weight of 
43,366 Da. This predicted molecular weight is slightly 
smaller than the measured molecular weight of the nu- 
cleocapsid protein of HCV-229E, which is 50 kDa as 
determined by SDS-—polyacrylamide gel electrophore- 
sis (MacNaughton, 1980). The difference is probably 
due to phosphorylation or other modification of the pro- 
tein. The predicted protein shares features with the nu- 
cleocapsid proteins of TGEV, MHV, BCV, HCV-OC43, 
and IBV (Kapke and Brian, 1986; Skinner and Siddell, 


1984; Armstrong et a/., 1983; Lapps et a/., 1987: Ka- 
mahora et a/., 1988; Boursnell et a/., 1985). Namely, 
the protein is highly basic and rich in serine residues. 
Sixty percent of the amino acid residues are basic and 
12% are acidic. There are 39 serine residues (10% of 
total), which are presumed to be sites of phosphoryla- 
tion (Stohlman and Lai, 1979). When compared to 
TGEV, with which HCV-229E shares antigenic proper- 
ties, both N proteins have identical amounts of basic 
and acidic amino acids and serine residues and similar 
molecular weights (Kapke and Brian, 1986). 

Figure 5 shows a schematic diagram of the possible 
ORFs obtained by translating the nucleotide sequence. 
The ORF in frame 3 is likely the one which encodes the 
nucleocapsid protein. In frame 2, the 5’-flanking region 
probably contains part of the sequence of the matrix 
protein encoded by gene 6. This possibility is sup- 
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scale (kb) 0.5 1.0 1.5 1.7 


Fic. 5. Schematic diagram of the possible open reading frames 
obtained when translating the primary nucleotide sequence. Vertical 
lines above the baseline represent potential initiation codons. Termi- 
nation codons are indicated by vertical lines below the baseline. 
Frame 3 depicts a single, long ORF encoding the nucleocapsid pro- 
tein. ORFs which are greater than 30 amino acids are also shown. 
Those lacking translation start sites are indicated by dashed lines. 


ported by the finding that reading frame 2 remains open 
at the extreme 5’-end. Furthermore, the sequence 
TCTAAACT, which is found in the intergenic regions of 
several other coronaviruses (Kapke and Brian, 1986; 
Skinner and Siddell, 1984; Armstrong ef a/, 1983; 
Lapps et a/., 1987; Kamahora et a/., 1988; Budzilowicz 
et al., 1985), is also present between the presumed ini- 
tiation codon of the main ORF and the 3’-end of gene 
6. This sequence is the proposed site of fusion of the 
leader sequence with the mRNA coding region (Shieh 
et al., 1987; Makino et a/., 1986; Budzilowicz et a/., 
1985). 

The 3’-noncoding region contains the sequence 
TGGAAGAGCCA, 75 nucleotides from the 3’-end (Fig. 
4), which is relatively conserved among coronaviruses 
and is found at approximately the same location in all 
of these viral genomes (Kapke and Brian, 1986; Skinner 
and Siddell, 1984; Armstrong et a/., 1983; Lapps et a/., 
1987; Kamahora et a/., 1988; Boursnell et a/., 1985) 
(Table 1). There is only one nucleotide difference in this 
conserved sequence when it is compared with that of 
TGEV, BCV, and HCV-0C43. Two and three nucleotide 
differences are found in IBV and MHV, respectively. 
This conservation of sequence and location suggests 
that it may be important for viral RNA replication. 

in frame 1, there are several additional ORFs of at 
least 30 amino acids. Some of these, including one 
found in the 3’-noncoding region, lack appropriate 
translation start sites. Another long internal ORF is 
found from base 322 through 693. This contains an 
appropriate initiation sequence and encodes a hypo- 
thetical protein of 13,974 Da, which is rich in leucine 
residues (17%). The significance of this ORF remains 
to be defined. 


Leader sequence of HCV-229E 


The mRNAs of coronaviruses contain a stretch of 
leader sequence which is derived from the 5’-end of the 


viral genome and exhibits homology with the intergenic 
consensus sequence (Shieh et a/., 1987; Budzilowicz 
et a/., 1985). Since our cDNA clones did not appear to 
contain leader sequences, we used primer extension 
studies to determine the sequence of the HCV-229E 
leader RNA. A_ synthetic oligodeoxyribonucleotide 
which was complementary to an 18-mer sequence lo- 
cated near the 5’-end of the gene (Fig. 4) was end-la- 
beled and used in a primer extension study with 
poly(A)-selected intracellular mRNA as a template. The 
reaction products, separated by agarose gel electro- 
phoresis, revealed six bands (data not shown). Since 
these bands were most likely to represent the primer- 
extended products of the individual MRNA species, the 
smallest and most abundant band, corresponding to 
the primer-extended product of mRNA7, was eluted 
and sequenced by the chemical modification method 
(Maxam and Gilbert, 1977). The sequence of the 3’-end 
of the primer-extended product was identical to the L8 
sequence from nucleotides 129 to 171. At nucleotide 
128, immediately 5’ to the proposed leader mRNA fu- 
sion site, the sequence diverged from the L8 sequence 
and revealed a putative 60-base leader sequence 
which is shown in Fig. 6. The figure also shows a de- 
gree of homology with the leader sequence of IBV. 
Considerably less homology exists between the leader 
sequence of HCV-229E and those of HCV-OC43 and 
MHV-JHM (data not shown). 


DISCUSSION 


This report presented the primary sequence of the 
nucleocapsid gene and leader sequence of HCV-229E. 
When compared to the known sequences of other cor- 
onaviruses (Kapke and Brian, 1986; Skinner and Sid- 
dell, 1984; Armstrong ef a/., 1983; Lapps et a/., 1987; 
Kamahora et a/., 1988; Boursnell et a/., 1985), com- 
mon features of coronavirus nucleocapsid proteins 
emerged; namely, they are highly basic and have a high 
proportion of serine residues, which have been shown 


TABLE 1 


CONSERVED SEQUENCE AT THE 3’-NONCODING REGION OF CORONAVIRUS 


Virus 3’ conserved sequence 
HCV-229E TGGAAGAGCCA (75) 
TGEV TGGAAGAGCTA (76) 
BCV GGGAAGAGCCA (79) 
HCV-0C43 GGGAAGAGCCA (79) 
IBV GGGAAGAGCTA (81) 
MHV-JHM GGGAAGAGCTC (82) 
MHV-A59 GGGAAGAGCTC (82) 


Note. Number in parenthesis indicates distance, in nucleotides, of 
the conserved sequence from the poly(A) region. 
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10 20 30 40 50 60 
| | | 
HCV-229E 5 '-CTTAAG* TACCTTAT*CTATCTA* CAAATAGAAAAG * * TIGCTTTTTAGACTTTGTGTC*TA*CTIC 
IBV 5 '-ACTTAAGATAGATATTAATATATATCTATTACACTAGCCTTGC**GCTAGATTTTTAA*CTTAACAAA..... 


Fig. 6. HCV-229E mRNA leader sequence compared to the leader sequence of IBV. The IBV leader extends for at least 16 nucleotides in the 


3’ direction. 


to be sites of phosphorylation (Stohlman and Lai, 
1979). The relationship between the nucleocapsid 
genes of HCV-229E and TGEV is particularly interest- 
ing since the viruses are antigenically related (Mac- 
Naughton, 1981). The predicted molecular weights of 
the N protein and the number of potential phosphoryla- 
tion sites of both viruses are almost identical. Although 
these two viruses have little nucleotide sequence ho- 
mology between their nucleocapsid genes, the amino 
acid sequences are homologous within a limited re- 
gion. Amino acid sequence analysis revealed several 
structural features common to both viruses, which may 
have functional significance. For instance, there is a 
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region of 46% homology within the amino-terminal 
one-third of the protein which extends from residues 
29 to 134 in HCV-229E, and 41 to 146 in TGEV. Fur- 
thermore, approximately 10 amino acids downstream 
from the homologous region in both proteins lies an 
area which is abundant in serine residues, suggesting 
that this may be an important functional domain of the 
molecule. To further examine such functional homol- 
ogy between the two proteins, hydropathy profiles 
were constructed (Fig. 7). The contour of these plots 
suggests that a certain degree of functional homology 
exists within the first and last one-third of each mole- 
cule, with an additional region around position 200. 
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Fic. 7. Hydropathy profiles of coronavirus N proteins. Both the K-D (solid line) and GES (dashed line) curves are depicted with scales on the 


right and left, respectively. 
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The peak around position 200 occurs just after the ser- 
ine-rich region of the molecule. The relative conserva- 
tion of these regions suggests a possible role in the 
interaction of the N protein with the viral genome. Sim- 
ilar structural features exist among the N proteins of 
HCV-229E, IBV, MHV, HCV-OC43, and BCV (Skinner 
and Siddell, 1984; Lapps et a/., 1987; Kamahora et a/., 
1988; Boursnell et a/., 1985). This is demonstrated by 
the hydropathy profiles of these proteins, which are 
also shown in Fig. 7. Further studies are required to 
reveal the functional significance of the conserved do- 
mains. 

Another interesting finding is the open reading frame 
internal to the main coding region of the HCV-229E N 
gene. Thus far, two other coronaviruses, BCV and 
MHV-JHM, have been found to contain internal ORFs 
in gene 7 (Skinner and Siddell, 1984; Lapps eta/., 1987) 
which are preceded by optimum translation initiation 
signals according to Kozak's consensus sequence 
(Kozak, 1983). The predicted amino acid sequences 
could encode hypothetical proteins of molecular 
weights 13,973; 14,842; and 23,057 for HCV-229E, 
MHV-JHM, and BCV, respectively. Interestingly, all 
three sequences are abundant in leucine residues (17 
to 19%). HCV-OC43 also has two smaller internal 
ORFs encoding potential leucine-rich proteins of 8830 
and 16,297 molecular weights (Kamahora et a/., 1988). 
Further studies to determine whether this hypothetical 
protein can be detected in 229E-infected cells or by in 
vitro translation of a full-length cDNA clone (i.e., L8) are 
in progress. 

Finally, the 3’-noncoding conserved sequence of 
gene 7 lends additional support to a common ancestry 
for coronaviruses, regardless of antigenic subgroup. 
This sequence has been proposed as a recognition site 
for the virus-encoded RNA-dependent RNA polymer- 
ase prior to negative-strand synthesis (Kapke and 
Brian, 1986). Certainly future studies must focus on ex- 
amining the role of this conserved region in the viral 
replication cycle. 
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